-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC improvements 6: introduce batched GC #4400
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
teh-cmc
added
⛃ re_datastore
affects the datastore itself
🚀 performance
Optimization, memory use, etc
do-not-merge
Do not merge this PR
exclude from changelog
PRs with this won't show up in CHANGELOG.md
labels
Nov 30, 2023
This was referenced Nov 30, 2023
teh-cmc
force-pushed
the
cmc/gc_improvements_5_storediff_optimizations
branch
from
November 30, 2023 11:29
a398980
to
69938c6
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_6_drop_buckets
branch
from
November 30, 2023 11:31
6a870da
to
2eb78ec
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_5_storediff_optimizations
branch
from
November 30, 2023 12:28
69938c6
to
8a228a4
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_6_drop_buckets
branch
from
November 30, 2023 12:28
b1cbd1e
to
8edab09
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_5_storediff_optimizations
branch
from
November 30, 2023 14:34
8a228a4
to
b5e2e3a
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_6_drop_buckets
branch
from
November 30, 2023 14:34
e06435d
to
fdc93e1
Compare
Wumpf
approved these changes
Dec 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but you didn't lie, this one was a bit tougher!
teh-cmc
force-pushed
the
cmc/gc_improvements_5_storediff_optimizations
branch
from
December 2, 2023 11:13
b5e2e3a
to
4e8ed7b
Compare
teh-cmc
force-pushed
the
cmc/gc_improvements_6_drop_buckets
branch
from
December 2, 2023 11:34
fdc93e1
to
62e5e95
Compare
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
Introduce 2 new benchmark suites that drive the development of this PR series: 1. Logging a tons of scalars, in order, across a bunch of series, themselves scattered across a bunch of plots. 2. Logging a tons of timeless data, across a bunch of entities. ### Benchmarks Hint: it's bad. ``` .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.00 1084.0±4.47ms 54.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.00 2.1±0.02s 27.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.00 465.8±2.50ms 125.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.00 655.3±2.61ms 89.4 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.00 652.8±4.12ms 89.8 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 2.4±0.05s 24.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 2.4±0.03s 24.1 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 2.5±0.08s 23.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 2.4±0.02s 24.5 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 2.4±0.03s 24.4 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
Fixes a long-standing bug: timeless tables not being sorted by `RowId`, which means they effectively always return incorrect results for out-of-order data (yes, that is a thing even in a timeless context). This _worsens_ GC performance for timeless tables, but: 1. The performance of incorrect code hardly matters to begin with, and 2. this is ground work for turning timeless tables in ringbuffers in an upcoming PR, which will massively improve performance. - Fixes #1807 ### Benchmarks Hint: it's even worse! ``` group gc_improvements_0 gc_improvements_1 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.00 1084.0±4.47ms 54.1 KElem/sec 1.03 1117.2±9.07ms 52.4 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.00 2.1±0.02s 27.6 KElem/sec 1.01 2.1±0.01s 27.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.00 465.8±2.50ms 125.8 KElem/sec 1.01 471.5±4.76ms 124.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.00 655.3±2.61ms 89.4 KElem/sec 1.02 666.7±6.64ms 87.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.00 652.8±4.12ms 89.8 KElem/sec 1.02 665.6±4.67ms 88.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 2.4±0.05s 24.2 KElem/sec 3.35 8.1±0.10s 7.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 2.4±0.03s 24.1 KElem/sec 3.30 8.0±0.09s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 2.5±0.08s 23.5 KElem/sec 3.23 8.1±0.11s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 2.4±0.02s 24.5 KElem/sec 3.38 8.1±0.11s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 2.4±0.03s 24.4 KElem/sec 3.35 8.1±0.07s 7.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
This turns every single column in `DataStore`/`DataTable` into a ringbuffer (`VecDeque`). This means that on the common/happy path of data being ingested in order: 1. Inserting new rows doesn't require re-sorting the bucket (that's nothing new), and 2. garbage collecting rows doesn't require re-sorting the bucket nor copying anything (that's very new). This leads to very significant performance improvements on the common path. - Fixes #1823 ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_3 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 4.50 1084.0±4.47ms 54.1 KElem/sec 1.00 241.0±1.66ms 243.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 8.86 2.1±0.02s 27.6 KElem/sec 1.00 239.9±2.70ms 244.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.88 465.8±2.50ms 125.8 KElem/sec 1.00 247.4±3.94ms 236.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.72 655.3±2.61ms 89.4 KElem/sec 1.00 241.2±2.06ms 243.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.72 652.8±4.12ms 89.8 KElem/sec 1.00 239.6±1.98ms 244.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 40.21 2.4±0.05s 24.2 KElem/sec 1.00 60.3±1.16ms 972.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 40.08 2.4±0.03s 24.1 KElem/sec 1.00 60.8±1.14ms 964.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 40.97 2.5±0.08s 23.5 KElem/sec 1.00 61.0±1.99ms 960.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 39.45 2.4±0.02s 24.5 KElem/sec 1.00 60.6±1.45ms 966.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 41.78 2.4±0.03s 24.4 KElem/sec 1.00 57.6±0.35ms 1018.1 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_1 gc_improvements_3 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 4.63 1117.2±9.07ms 52.4 KElem/sec 1.00 241.0±1.66ms 243.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 8.96 2.1±0.01s 27.3 KElem/sec 1.00 239.9±2.70ms 244.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.91 471.5±4.76ms 124.3 KElem/sec 1.00 247.4±3.94ms 236.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.76 666.7±6.64ms 87.9 KElem/sec 1.00 241.2±2.06ms 243.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.78 665.6±4.67ms 88.0 KElem/sec 1.00 239.6±1.98ms 244.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 134.66 8.1±0.10s 7.2 KElem/sec 1.00 60.3±1.16ms 972.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 132.44 8.0±0.09s 7.3 KElem/sec 1.00 60.8±1.14ms 964.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 132.22 8.1±0.11s 7.3 KElem/sec 1.00 61.0±1.99ms 960.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 133.27 8.1±0.11s 7.3 KElem/sec 1.00 60.6±1.45ms 966.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 140.04 8.1±0.07s 7.3 KElem/sec 1.00 57.6±0.35ms 1018.1 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
Indexes `EntityPathHash`es alongside `TimePoint`s in the metadata registry to avoid having to run fullscans during garbage collection. Yields some more significant wins in the common case. ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_4 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 10.32 1084.0±4.47ms 54.1 KElem/sec 1.00 105.0±0.91ms 558.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 19.80 2.1±0.02s 27.6 KElem/sec 1.00 107.3±0.83ms 546.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 4.38 465.8±2.50ms 125.8 KElem/sec 1.00 106.3±0.74ms 551.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 6.16 655.3±2.61ms 89.4 KElem/sec 1.00 106.4±0.94ms 550.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 6.34 652.8±4.12ms 89.8 KElem/sec 1.00 102.9±0.75ms 569.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 37.12 2.4±0.05s 24.2 KElem/sec 1.00 65.3±0.81ms 897.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 37.54 2.4±0.03s 24.1 KElem/sec 1.00 64.9±1.07ms 903.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 38.81 2.5±0.08s 23.5 KElem/sec 1.00 64.4±0.99ms 910.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 37.00 2.4±0.02s 24.5 KElem/sec 1.00 64.6±1.08ms 906.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 36.82 2.4±0.03s 24.4 KElem/sec 1.00 65.3±1.29ms 897.3 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_3 gc_improvements_4 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 2.30 241.0±1.66ms 243.1 KElem/sec 1.00 105.0±0.91ms 558.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 2.24 239.9±2.70ms 244.3 KElem/sec 1.00 107.3±0.83ms 546.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 2.33 247.4±3.94ms 236.8 KElem/sec 1.00 106.3±0.74ms 551.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.27 241.2±2.06ms 243.0 KElem/sec 1.00 106.4±0.94ms 550.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.33 239.6±1.98ms 244.6 KElem/sec 1.00 102.9±0.75ms 569.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 60.3±1.16ms 972.3 KElem/sec 1.08 65.3±0.81ms 897.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 60.8±1.14ms 964.3 KElem/sec 1.07 64.9±1.07ms 903.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 61.0±1.99ms 960.9 KElem/sec 1.06 64.4±0.99ms 910.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 60.6±1.45ms 966.9 KElem/sec 1.07 64.6±1.08ms 906.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 57.6±0.35ms 1018.1 KElem/sec 1.13 65.3±1.29ms 897.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
teh-cmc
force-pushed
the
cmc/gc_improvements_5_storediff_optimizations
branch
from
December 2, 2023 11:56
4e8ed7b
to
1963e56
Compare
Base automatically changed from
cmc/gc_improvements_5_storediff_optimizations
to
main
December 2, 2023 11:56
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
Optimize the creation of `StoreDiff`s and `StoreEvent`s, which turns out to be a major cost in time series use cases, when it is common to generate several millions of those on any single GC run. Once again some pretty significant wins. ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_5 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 13.00 1084.0±4.47ms 54.1 KElem/sec 1.00 83.4±1.16ms 702.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 25.37 2.1±0.02s 27.6 KElem/sec 1.00 83.7±0.61ms 700.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 5.55 465.8±2.50ms 125.8 KElem/sec 1.00 84.0±0.50ms 697.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 7.94 655.3±2.61ms 89.4 KElem/sec 1.00 82.5±1.33ms 710.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 8.02 652.8±4.12ms 89.8 KElem/sec 1.00 81.4±0.94ms 720.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 35.87 2.4±0.05s 24.2 KElem/sec 1.00 67.5±2.21ms 867.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 35.91 2.4±0.03s 24.1 KElem/sec 1.00 67.8±1.86ms 863.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 37.02 2.5±0.08s 23.5 KElem/sec 1.00 67.5±1.43ms 868.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 35.47 2.4±0.02s 24.5 KElem/sec 1.00 67.4±1.40ms 869.4 KElem/sec .../timeless_logs/drop_at_least=0.3/default 36.00 2.4±0.03s 24.4 KElem/sec 1.00 66.8±0.85ms 877.3 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_4 gc_improvements_5 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.26 105.0±0.91ms 558.1 KElem/sec 1.00 83.4±1.16ms 702.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.28 107.3±0.83ms 546.2 KElem/sec 1.00 83.7±0.61ms 700.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.27 106.3±0.74ms 551.3 KElem/sec 1.00 84.0±0.50ms 697.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.29 106.4±0.94ms 550.6 KElem/sec 1.00 82.5±1.33ms 710.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.26 102.9±0.75ms 569.4 KElem/sec 1.00 81.4±0.94ms 720.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 65.3±0.81ms 897.6 KElem/sec 1.03 67.5±2.21ms 867.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 64.9±1.07ms 903.2 KElem/sec 1.05 67.8±1.86ms 863.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 64.4±0.99ms 910.2 KElem/sec 1.05 67.5±1.43ms 868.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 64.6±1.08ms 906.9 KElem/sec 1.04 67.4±1.40ms 869.4 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 65.3±1.29ms 897.3 KElem/sec 1.02 66.8±0.85ms 877.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
teh-cmc
force-pushed
the
cmc/gc_improvements_6_drop_buckets
branch
from
December 2, 2023 11:57
4ce3776
to
e8b96da
Compare
teh-cmc
added a commit
that referenced
this pull request
Dec 2, 2023
Adds a configurable time bound to the GC, in addition to the pre-existing space bound. ```rust /// How long the garbage collection in allowed to run for. /// /// Trades off latency for throughput: /// - A smaller `time_budget` will clear less data in a shorter amount of time, allowing for a /// more responsive UI at the cost of more GC overhead and more frequent runs. /// - A larger `time_budget` will clear more data in a longer amount of time, increasing the /// chance of UI freeze frames but decreasing GC overhead and running less often. /// /// The default is an unbounded time budget (i.e. throughput only). pub time_budget: Duration, ``` No time budget: https://github.com/rerun-io/rerun/assets/2910679/8ca63aa3-5ad4-4575-9486-21d805026c1e 3.5ms budget: https://github.com/rerun-io/rerun/assets/2910679/e1bd1a41-6353-4a0e-90e5-8c05b76e92ea --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
exclude from changelog
PRs with this won't show up in CHANGELOG.md
🚀 performance
Optimization, memory use, etc
⛃ re_datastore
affects the datastore itself
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Makes the GC capable of dropping entire buckets in one go when the conditions are met (and they are pretty simple to meet in the common case of in-order data).
Unfortunately, I couldn't make the batched GC match -- let alone improve -- the performance of the standard GC.
I even have a branch with a parallel batched GC, and it's still slower: the overhead of the batching datastructures just kills me everytime.
For that reason, batching is disabled by default.
I still want to commit the code so as to prevent it from rotting though, so we can come back to it at a later time.
This introduces a slight performance deterioration on the non-batched path, that's fine.
Benchmarks
Compared to
main
:Compared to previous PR:
Part of the GC improvements series:
RowId
-ordered #4395VecDeque
extensions & benchmarks #4396EntityPathHash
es in metadata registry #4398Store{Diff,Event}
optimizations #4399time_budget
GC setting #4401Checklist